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What Are The Goals Of This Course? 



To provide an overview of the AT&T WEDSP3210. 



To provide the basic tools to begin writing 
applications using the DSP3210 development tools. 



To provide insight into VCOS, the DSP3210 
multi-tasking environment. 


Prerequisites: 

This course assumes a knowledge of the ’C’ 
programming language. It also assumes familiarity with 
assembly language. Some knowledge of typical DSP 
functions and applications is also helpful, but not required. 


Disclaimer: While Lavitsky Computer Laboratories, Inc. 
("LCL") has made every attempt to verify that the 
information contained in this presentation is accurate, the 
information provided herein is provided "as is” without 
warranty of any kind, either express or implied. LCL assumes 
no liability for direct or indirect damages resulting from any 
defect in this information. 
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Why DSPs? 


Existing CISC/RISC processors lack signal processing 
architecture 

DSPs are good for performing a large number of 
repetitive mathematical operations combined with 
extreme memory bandwidth requirements 

DSPs are capable of real-time signal processing of 
real-world data (e.g. audio samples) 
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Overview Of DSP3210 Architecture 


• 32-Bit Floating Point DSP - 24-Bit mantissa, 8-Bit exponent 

Advantages over fixed point DSP (e.g. Motorola 56000) 

Larger dynamic range (in excess of 1500dB as opposed to < 300dB) 

IEEE P754 Floating Point Format 
Both mu-law and A-law encoding 
Up to 33Mflops (66.7Mhz clock rate) 

• Single Cycle Instructions 

• Serial I/O With DMA Transfer Counters up to 25Mbits/second 

Serial data transfers occur without processor intervention 
Cycles are stolen when necessary 
DMA control for serial in and serial out 

• Bit I/O - General Purpose 8-Bit I/O Port 

Provides flexible control of external hardware 

• Memory Mapped I/O (MMIO) 

Provides for future expansion 

• 2048 32-Bit Words Contiguous On-Chip RAM 

High-speed RAM for both instructions and data 

Diligent use of this memory eliminates need for expensive static RAM 

• Programmable 32-Bit Timer 

Can be used for interval timing, rate generation, event counting, or 
waveform generation 

Can generate interrupt when count reaches zero 

• Fully Vectored Interrupt Structure With Hardware Context Save 

Allows very fast interrupt processing (up to 2 million interrupts/sec) 

• Barrel Shifter 

For bit manipulation in graphics or data encryption, etc. 
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DSP3210 Architecture (cont’d) 


• Memory and Bus Features 

32-Bit Addressing 

Share System Bus - reduces cost 

Quad-Word Transfer Capability - Efficient memory transfers 

Selectable Byte-Ordering - Easy integration into Motorola architecture 

• Low-Power CMOS Design 


Seven functional units: 

Control Arithmetic Unit (CAU) 

Data Arithmetic Unit (DAU) 

On-chip Memory (RAMO, RAMI, Boot ROM) 
Bus Interface 
Serial I/O (SIO) 

DMA Controller (DMAC) 

Timer/Status Control (TSC) 
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Control Arithmetic Unit 


Responsible for: 

Address calculation 
Branching control 

16/32-Bit integer arihtmetic and logic operations 

RISC Core Consisting of: 

32-Bit Arithmetic Logic Unit (ALU) (integer and logic) 

32-Bit Program Counter (PC) 

22 32-Bit General Purpose Registers (r0-r22) 

32-Bit Barrel Shifter 

Executes at up to 12.5 MIPS: 

Executes integer, data move, and control instructions (CA) 
Generates addresses for the operands of floating point instructions 

CA Instructions perform load/store, branching control, and 
16/32-Bit integer arithmetic and logic operations 

DA Instructions can have up to four memory accesses per 
instruction 

CAU is reponsible for generating addresses using the 
post-modified, register-indirect addressing mode - one in 
each of the four states of an instruction cycle 

Special register considerations (under some conditions): 

rO hardwired to 0 (always) 

r1-r14 DA instruction memory reference (X,Y,Z) pointer registers 
r15-r19 DA instruction memory reference (X,Y,Z) increment registers 
r20 used by error exception facility to store old pc 
r21 stack pointer (sp) 

r22 pointer to the exception vector table (evtp) 
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Data Arithmetic Unit 


DAU Consists of: 

32-Bit floating point multiplier 

40-Bit floating point adder 

Four 40-Bit floating point accumulators (a0-a3) 

A clip test register (ctr) 

A control register (dauc) 

The multiplier and adder operate in parallel to perform 12.5 
million computations per second of the form (a=b+c*d) 

The DAU contains a four stage pipeline 

The DAU supports the following floating point formats: 

Single precision (32-Bit) 

Extended single precision (40-Bit) 

Extended single precision uses 8 additional mantissa guard bits 
All normalization is performed automatically 

Single instruction, data type conversions are done in the 
DAU hardware: 

DSP32 and IEEE 32-Bit floating point 
16/32-Bit integer 
8-Bit unsigned 
mu-law and A-law 
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Addressing Modes 


Addressing Mode 

Instrucl 

tion Type 

CA Data 
Move Group 
(CAU Reg) 

CA Data 
Move Group 
(I/O Reg) 

CA Arithmetic/ 
Logic Group 

DA M/A & 
Special Func 

Short Immediate 

Yes 




24-Bit Immediate 

Yes 




Memory Indirect 

Yes 




CAU Register Direct 

Yes 

Yes 

Yes 


10 Register Direct 

Yes 




DAU Register Direct 




Yes 

Register Indirect 

Yes 

Yes 


Yes 

Register Indirect with 

Yes 

Yes 


Yes 

Postmodification 






Notation: 

a0-a3 are the accumulators 
r0-r22 are the CAU registers 

aO = rl ; CAU register direct, store contents of rl in aO 

aO = rl + r2 ; add two numbers in rl ,r2, store result in aO 

aO = *r1 + r2 ; add number pointed to by rl and in r2, store in aO 

aO = *r1++r2 ; post modify increment rl by r2, store in aO 

a2 = a2 + *r2*a3 ; use that pipeline! 


Introduction to the AT&T DSP3210 on the Amiga 3000+ © LCL, Inc. Page 10 







CA Control Instructions 


if (COND) goto {N, rB, rB+N} 
if (rM-->=0) goto {N, rB, rB+N} 
goto {N, rB, rB+N, M, rB+M} 
nop 

call {N, rB, rB+N, M} (rM) 
return {rM} 
do K, {L, rM} 

dolock K, {L, rM} 
doblock {L, rM} 
i return 
sftrst 

waiti 


Conditional branch based on flags 

Conditional branch using loop counter 

Unconditional branch 

No operation 

Call subroutine 

Return from subroutine 

Do next K+1 instruction(s) L+1 (or rM+1) 

time(s). K=0,1,2...127; L=rM=0,1,2,...2047 

dolock signals interlocked bus access 

doblock signals quad-word transfers 

Return from interrupt 

Soft-reset; Changes error level to base level; 

encoded as spc=(byte)rO 

Wait for interrupt; encoded as spc=(long)r0 


where: 

rB = pc, r0-r22 
rM = rl -r22 

N = 16-Bit signed integer 

M = 24-Bit unsigned integer 

COND = one of the DSP3210 condition codes 
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DA Special Instructions 


[Z=] aN = ic(Y) 

[Z=] aN = oc(Y) 

[Z=\ aN = float16(Y) 
[Z=] aN = float32(Y) 
[Z=] aN = inti 6(Y) 
[Z=] aN = int32(Y) 
[Z=] aN = round(Y) 
[Z=] aN = ifalt(Y) 
[Z=] aN = ifaeq(Y) 
[Z=] aN = ifagt(Y) 
[Z=] aN = dsp(Y) 
[Z=] aN = ieee(Y) 
[Z=] aN = seed(Y) 


Input conversion mu-law, A-law, 8-bit linear to float. 
Output conversion float tomu-law, A-law, 8-bit linear 
16-bit integer to float 
32-bit integer to float 

Float to 16-bit integer (round or truncate, dauc[4]) 
Float to 32-bit integer (round or truncate, dauc[4]) 
Round to nearest, float(40) to float(32) 

Conditional assignment/memory write 
Conditional assignment/memory write 
Conditional assignment/memory write 
IEEE to DSP format conversion. 

DSP to IEEE format conversion. 

32-bit to 32-bit reciprocal seed. 


(Y may not be a0-a3 for the dsp special function) 
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Overview Of Amiga 3000+ DSP3210 

Integration 
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The Development Tools 


A full suite of development tools is available for 
developing DSP applications under AmigaDOS: 


d32ar 

- Archiver/Librarian 

d32as 

- Assembler 

d32cc 

- C Compiler 

d32cpp 

- C Preprocessor 

d32dump 

- Dump Section Information 

d32ld 

- Link Editor 

d32make 

- Maintain and Update Related Files 

d32nm 

- Print Name List of Object Files 

d32optim 

- Code Optimizer 

d32sect 

- Relocatable Code Section Identifier 

d32size 

- Print Section Size of Object Files 

d32strip 

- Strip Symbol Information 

d32trans 

- DSP32C Object Code Translator 
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The Directory Structure 


The DSP3210 SGS (Software Generation System): 



Environment Variables: 


DSP3210SL 

DSP3210_AsmPP 

DSP3210_Aux 

DSP3210_lncludes 

DSP3210_Libraries 

DSP3210_Temp 

DSP3210 Tools 


- The root (SGS) directory 

- Which preprocessor to use 

- Where the aux files are 

- Where the include files are 

- Where the libraries are 

- Where to place temp files 

- Where the tools are 


The aux files include macros and help files for the simulator 
and binary files containing boot code for the DSP3210. 
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Memory Sections 


Two basic memory sections: 

On-chip: 

RAMO 
RAMI 
Boot ROM 

Off Chip: 

External Memory A 
External Memory B 

Information on how to section a program is placed in a memory 
map (called an ifile). Most applications will use the default of 
placing themselves in external memory and let the operating 
system/environment decide when to use on chip memory for 
code, data, or buffers. The default ifile appears as follows: 

MEMORY{ 

.mextA: o=0x0,1=0x50030000 

.mbrom: o=0x50030000,1=0x400 

.mraml: o=0x5003e000,1=0x1000 

.mramO: o=0x5003f000,1=0x1000 

.mextAhi: o=0x50040000, I=0xffc0000 

.mextB: o=0x60000000, I=0x9fffffff 

} 

SECTIONS { 

.brom: {} > .mbrom 

.rami: {}> .mraml 

.ramO: {} > .mramO 

.extA: {} > .mextA 

.extAhi: {}>. mextAhi 

.extB: {} > .mextB 

} 
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Sample C Application 


/* mat2x2.c */ 


floatA[2][2]={ 1.0, 


2 . 0 , 

6.0 


floatB[2][2]={ 

1 . 0 , 2 . 0 , 

2.0, 3.0 

}; 

floatC[2] [2]; 

main () 

{ 

void mat2x2 (); 
register float *a = A[0]; 
register float *b = B[0]; 
register float *c = C [ 0 ]; 

mat2x2(a,b,c); 

} 
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VCOS Overview 


VCOS - "Visible Caching Operating System" 

VCOS provides a platform-independent environment for 
multitasking applications on a single DSP (and in the future, 
multiple DSPs). VCOS is comprised of four parts: 

VCOS - VCOS real-time DSP scheduler 

VCAS - VCOS Application Server 

VCD - VCOS Debugger 

VCOS Basic and Enhanced Module Library 

VCOS runs on the DSP and has one very basic function. It 
traverses an execution list and executes DSP code modules in 
turn at a pre-determined frame rate or quantum (every 10ms). 

VCAS provides routines for the host system (the Amiga) to 
manage DSP tasks, communicate between DSP tasks and host 
applications, etc. Some sample functions include: 

TaskLoad(), TaskStart(), TaskStopO 
FifoReadO, FifoWrite() 

VCD provides a full symbolic debugging environment for DSP 
applications running under VCOS. Some of the key features of 
VCD include: 

Symbolic Disassembly 
Breakpoints and single step facilities 
Task and Module Status 
Real-time Simulations Using File I/O 

VCD is used by DSP application programmers or very advanced 
users to manage and debug DSP tasks. VCD makes calls to 
VCAS and uses special DSP Code Modules to accomplish its 
functions. 
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VCOS Functional Model 


VCOS applications are written most efficiently in the DSP3210 Assembler, or 
using the DSP3210 C Compiler. The DSP is not a general purpose CPU, and is 
best suited to performing signal processing or repetative mathematical 
operations. Application programmers must separate relevant routines or 
algorithms from their code that are best suited to the DSP. These routines are 
then implemented on the DSP to be used as "subtasks" from their main 
applications. 

All code loading and execution preparation is managed by the host. The DSP 
shares the host memory space and also has high-speed local (on-chip) memory. 
Critical applications or sections can be executed in on-chip memory for 
performance. Sections may be swapped in or out of on-chip memory on demand 
as the execution list of the various tasks is traversed. This feature allows 
fine-tuning DSP applications to obtain maximum performance at very low system 
cost. The DSP can see host memory and cache critical sections it needs in 
on-chip memory. The application programmer inherently sees and codes to take 
advantage of this caching mechanism, hence the term "Visible Caching". A 
Module may load state information before execution and save state before exiting 
or relinquishing the DSP. There are three distinct address spaces under the 
VCOS DSP/Host model: 

Host address- host memory (physical, contiguous by section, locked) 

DSP address- DSP physical address (mapped into host memory) 

Execute/Cache address - (location in on-chip memory) 

Modules exist in either host memory or DSP memory. Cached Modules are 
present and running in on-chip DSP memory. A Module may be represented in all 
three address spaces at any given time, for example: 

Host loads DSP code into its’ memory space (shared by DSP). 

Host "downloads" code to DSP memory space (no move, just a translation 

and possible relocation). 

DSP caches code and begins executing out of on-chip memory. 

Modules may be declared as cacheable or non-cacheable. If a user attempts to 
cache a non-cacheable section, the loader will flag an error. There are two types 
of caching available: 

• Auto-caching 

• Demand-caching 
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VCOS Functional Model (cont’d) 


Module sections which are auto-caching are automatically placed in on-chip 
memory. If there is no more room to cache an auto-cache Module, the loader will 
flag an error. When using demand-caching, programmers may use macros in 
their code to explicitly cache or unload sections. 

DSP programs are written as efficiently as possible and for very specific 
functions. These functions, called "Modules" can be combined together to create 
larger applications, or "Tasks". An example DSP application would look 
something like this: 


DSP Host 


-7 

Task "Answering Machine" 

\ 

Task "Answering Machine" 

VCOS Modules: 

Buffer 

User Interface 

DTMF Decoding 

DSP Task Management 

Call Progress Detect 
CODEC Device Driver 

\ 

VCAS 

_ 


Modules are executed from one of two execution lists: foreground or interrupt. 
The foreground execution list is executed in round-robin fashion 
(non-preemptive). The interrupt execution list is executed once per external 
interrupt and is used for time-critical I/O tasks (the code is guaranteed to be 
executed when an interrupt occurs). Tasks on the foreground execution list are 
executed at the system frame rate (or quantum) and are said to be 
"frame-synchronous". This is also called "block processing" since each task is 
designed to operate on a fixed size block of data during the frame. Writing to take 
advantage of this block processing results in efficient and deterministic data-flow. 
This allows reliable multitasking for DSP algorithms under VCOS. Data is passed 
between Modules and between Modules and the host using "Buffers". There are 
three types of Buffers under VCOS: 

AIAO - All In All Out (frame synchronous, cacheable, random access) 
FIFO - First In First Out (asynchronous) 

PARAMS - (local shared, random access) 

AIAO Buffers are typically used for critical real-time DSP I/O. These buffers are 
static and reside in on-chip memory (cached). They are serviced every frame 
under VCOS and are considered to be "real-time". AlAOs manage a fixed size 
data stream. FIFO buffers are named and may be accessed among multiple 
Modules or Tasks. These buffers are asynchronous in nature and may not get 
serviced every frame (non real-time). FIFO Buffers are used for managing 
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VCOS Functional Model (cont’d) 


sequentially accessed data streams (e.g. audio samples). PARAMS Buffers can 
contain any random-access data required by the application and can be used to 
map host physical addresses into a DSP application. PARAMS Buffers have a 
fixed size, unlike AIAO and FIFO Buffers which can have their size set at load 
time. Both FIFO and PARAMS Buffers can be used for inter-Module abd 
Module-host communication. 

VCOS also provides for special "Device Driver” Modules. These Modules are 
implemented to take advantage of system specific hardware (e.g. CODECs) and 
have standard calling interfaces. Once a Device Driver Module is written for a 
specific piece of hardware, any VCOS Module can take advantage of it for I/O. 
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VCOS Library Features 


Standard VCOS Modules include: 

Integer sample rate converter 
Non-integer sample rate converter 
16/24Kbps subband coder 
G.722 7Khz speech coder 
4.7Kbps CELP coder 

DTMF generator/detector 
Call progress detector 

Delta-Cepstrum feature extractor 
Text to phones, phones to LPC, LPC to speech 
Speaker trained, isolated word recognizer 
Speaker independent connected digit recognizer 
Talker verification 

3D graphics library 
Perceptual image coder (PIC) 

Perceptual music coder 

JPEG still image coder 

MPEG image coder (non real-time) 

MPEG audio coder 

V.22bis MNP5 Modem (V.22, V.22bis, Bell 212A, V.23, V.21, 
Belli 03) 

Enhanced Software Pack: 

V.32 Modem with fallback 
V.29 G3 Fax Modem with fallback 
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Amiga VCOS Implementation 

Multiple software layers: 

dsp3210.resource - low level hardware access, 
resource allocation. 

dsp.device - low level control operations 
dsp.library - VCAS shared library 
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